Perakath Benjamin, Knowledge
Based Systems Inc., pbenjamin@kbsi.com
(PRIMARY)
Karthic Madanagopal, Knowledge
Based Systems Inc., kmadanagopal@kbsi.com
Kumar Akella, Knowledge Based
Systems Inc., kakella@kbsi.com
Kalyan Vadakkeveedu, Knowledge
Based Systems Inc., kvadakkeveedu@kbsi.com
Student Team: NO
Did you use data from both mini-challenges? NO
-
Intelligence Products Mosaic: This semantic framework
supports information discovery, sense making, and presentation in a dynamic, collaborative
environment. This technology reduces ‘data-to-decision’ time through the
use of semantic and collaborative visual analytics techniques.
-
D3: We built
our custom data explorer using D3.js visualization library. It helps to
dynamically generate SVGs from data.
-
Microsoft SQL
Server 2008: In
order to make the exploration scalable, we loaded the MC1 movement data into
Microsoft SQL server and wrote SQL queries to extract useful statistics.
-
Microsoft Office
Excel®: The
statistics extracted using SQL queries were loaded into Excel and analyzed in
detail by creating various plots to validate our hypothesis.
-
MATLAB®: We performed
data exploration, visualization and statistical analysis on the MC1 data using
algorithms written in MATLAB.
-
NodeXL: The plotting capabilities of NodeXL plugin for Microsoft Excel were utilized to
visualize the data and the results of our analysis algorithms.
Approximately how many hours were spent
working on this submission in total?
120 hours
May we post your submission in the
Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES
Video Download
Video:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC1.1
– Characterize the attendance
at DinoFun World on this weekend. Describe up to
twelve different types of groups at the park on this weekend.
a. How big is
this type of group?
b. Where does
this type of group like to go in the park?
c. How common
is this type of group?
d. What are
your other observations about this type of group?
e. What can you
infer about this type of group?
f. If you were
to make one improvement to the park to better meet this group’s needs, what
would it be?
Limit your response to no more than 12 images and
1000 words.
1a.
We built a visual analytic interface to explore the
movement data for all three days and identified the various group types. Figure 1
shows how groups (individuals traversing together in the park) of different
sizes stack up relative to each other for Saturday.
Figure 1: Saturday Visitors by Group Size
Here there are 20 different group types ranging from
sizes 3 to 43. For each group size, the
number of instances (how many distinct occurrences) are computed. For e.g., Group Size 4 has 314 instances that
account to 14.95% of total number of group instances visiting the park on
Saturday. So, each segmented area gives
an indication of how big the group type is.
It happens that Group Sizes 30 to 43 have instance percentage of 0.05%. Figure 2
and Figure 3
show the group types identified for Friday and Sunday.
Figure 2:
Friday Visitors by Group Size
Figure 3:
Sunday Visitors by Group Size
1b.
Continuing with the Saturday’s data, we investigated what
attraction category Group Size 4 is most and least likely to visit. Figure 4
indicates the most popular attraction category is Thrill Rides with
1200 visits, followed by Shows and Entertainment with 1000
visits, and Kiddie Rides with 200 visits.
Figure 4:
Group Size 4 Preference Data for Saturday
1c.
In the context of Saturday’s data, the number of
instances for Group Size 3 and Group Size 4 are comparable with 14.95% and
18.05%, respectively as shown in Figure 5. Likewise, Group Sizes 30 to 43 are comparable
with 0.05%. The larger the group size, the
least likely is their occurrence in dataset.
So, Group Sizes 3 and 4 combined have a likelihood of 1/3 occurrence.
Figure 5:
Saturday Visitors Grouped by Size showing Relative Frequency
1d.
The box plot in Figure 6
is for Saturday’s data shows comparison of Group Size 4 with Group Sizes 2, 3,
and 5. Any instances of these group
sizes are likely to visit 16 rides.
Likewise, any instances of Group Sizes 6 and 7 are likely to visit 23
rides. Other such observations are also
evident with other Group Sizes.
All individual members of different group types move
together and visit the same sequence of attractions except for a few anomalous
instances as identified in Figure 10.
Figure 6:
Comparison of Group Size 4 with Groups of Different Sizes
1e.
Further analyzing Saturday’s data for Group Size 4, we
discovered as shown in Figure 7
that the most preferred attraction type for individuals is Thrill Rides and the
least favorable attraction type is either Kiddie Rides or Rides for
Everyone. Out of a total of 314 instances,
plots are presented for four instances.
This trend was evident for vast number of instances.
Figure 7:
Preferred Attraction Category for Group Size 4
1f.
The movement of an instance of Group Size 4 (Instance #4)
from Saturday’s data is plotted in Figure 8. The traverse diagram on the left, shows this
group instance travelled together throughout their movement in the park, as
indicated by pink colored box at each attraction ID. They started at entry point (0) and moved to
attraction 5 (thrill ride), continued to attraction 8 (thrill ride), followed
to attraction 13 (kiddie ride), journeyed again to attraction 5 (thrill ride),
then to attraction 7 (thrill ride), and ended at attraction 1 (thrill
ride). In addition to the attraction
sequence, the figure also shows traversing distance between attraction by means
of link thickness, i.e., greater the thickness longer the distance. The location map on the right shows
respective attraction sites w.r.t. park guide.
The suggestion for improving customer experience is to co-locate all
Thrill Rides rather than geographically disperse them, for instance, as seen in
location map, distance from 5 to 8 is relatively large compared to other pairs
of attractions.
Figure 8:
Group Movement Data for a Group Instance
MC1.2
– Are there
notable differences in the patterns of activity on in the park across the three
days? Please describe the notable
difference you see.
Limit your
response to no more than 3 images and 300 words.
2.
The visitor patterns at locations 32, 63, and 64 are in
contrast across three days as shown in Figure 9. For instance, on Sunday, all visitor
recordings at location 32 dropped to zero after 11:59 am and similar behavior
is noticed at location 63 after 10:59 am.
Also, visitor recordings at location 64 is slightest higher (for most
part) on Sunday compared to Saturday.
Saturday seems to exhibit higher visitor attendance at locations 32 and
63 compared to Friday and Sunday’s data.
Figure 9:
Visitor Patterns for Locations 32, 63 and 64
MC1.3
– What
anomalies or unusual patterns do you see? Describe no more than 10 anomalies, and
prioritize those unusual patterns that you think are most likely to be relevant
to the crime.
Limit your
response to no more than 10 images and 500 words.
3.
Analyzing Friday’s data for Group Size 10 which has 11
instances, it is observed that members in all Group Instances (except instance
#6) move through the park in sync (visit attractions in sequence)
for the whole day, as shown in Figure 10.
It is noticed in Group Instance 6 that
member 410025 has missed one attraction #27 at time 15:04, among
the collection of 23 attractions, while his peers have visited that
attraction. Similar trends were observed
for Saturday (Group Size 10, Group Instance 11) for member 565489 who missed
attraction 5 at 11:06 am and Saturday (Group Size 10, Group Instance 9) for
member 810466 who missed attraction 81 at 18:27.
Figure 10:
Anomalies for Group Size 10